19 research outputs found

    In-Datacenter Performance Analysis of a Tensor Processing Unit

    Full text link
    Many architects believe that major improvements in cost-energy-performance must now come from domain-specific hardware. This paper evaluates a custom ASIC---called a Tensor Processing Unit (TPU)---deployed in datacenters since 2015 that accelerates the inference phase of neural networks (NN). The heart of the TPU is a 65,536 8-bit MAC matrix multiply unit that offers a peak throughput of 92 TeraOps/second (TOPS) and a large (28 MiB) software-managed on-chip memory. The TPU's deterministic execution model is a better match to the 99th-percentile response-time requirement of our NN applications than are the time-varying optimizations of CPUs and GPUs (caches, out-of-order execution, multithreading, multiprocessing, prefetching, ...) that help average throughput more than guaranteed latency. The lack of such features helps explain why, despite having myriad MACs and a big memory, the TPU is relatively small and low power. We compare the TPU to a server-class Intel Haswell CPU and an Nvidia K80 GPU, which are contemporaries deployed in the same datacenters. Our workload, written in the high-level TensorFlow framework, uses production NN applications (MLPs, CNNs, and LSTMs) that represent 95% of our datacenters' NN inference demand. Despite low utilization for some applications, the TPU is on average about 15X - 30X faster than its contemporary GPU or CPU, with TOPS/Watt about 30X - 80X higher. Moreover, using the GPU's GDDR5 memory in the TPU would triple achieved TOPS and raise TOPS/Watt to nearly 70X the GPU and 200X the CPU.Comment: 17 pages, 11 figures, 8 tables. To appear at the 44th International Symposium on Computer Architecture (ISCA), Toronto, Canada, June 24-28, 201

    Latch Optimization in Circuits Generated from High-level Descriptions

    Get PDF
    : The authors address the problem of eOEciently exploring good latch/logic tradeooes for large designs generated from high-level specications. They describe algorithms for reducing the number of latches while controlling the size of the intermediate logic. Key-words: register removal, sequential optimization, state assignment, high-level synthesis, reachable states computation (R#sum# : tsvp) Acknowledgments: This work was supported in part by the National Science Foundation under grant INT-9505943, and the French GENIE MESR INRIA project. * #cole Nationale Sup#rieure des Mines de Paris, Centre de Math#matiques Appliqu#es 06904 Sophia-Antipolis, FRANCE, Email : [email protected] Unite de recherche INRIA Sophia-Antipolis 2004 route des Lucioles, BP 93, 06902 SOPHIA-ANTIPOLIS Cedex (France) Telephone : (33) 93 65 77 77 -- Telecopie : (33) 93 65 77 65 Optimisation du nombre des registres dans les circuits g#n#r#s # partir de langages de haut niveau R#sum# : Les aut..

    The 2014 European Elections in Romania

    No full text
    The 2014 European elections in Romania represented a test for the political parties preparing for the presidential elections at the end of the same year. Firstly, we analyze the political context in which the European elections took place. Since 2012 the changing governing coalitions have created an unstable party system with many politicians shifting party allegiances. Several high ranking party officials were considered suspicious for corruption acts and this affected the nomination of candidates. Secondly, we show that although the ideological allegiance of citizens and political parties increased since 2012, the match between policy preferences of political party and their supporters continues to be remarkably low. Finally we discuss several effects of the European elections, including difficulties in appointing candidates and creating electoral coalitions for the presidential elections encountered by the center right wing parties

    Efficient Latch Optimization Using Exclusive Sets

    No full text
    Controller circuits synthesized from high-level languages often have many more latches than the minimum, with a resulting sparse reachable state space that has a particular structure. We propose an algorithm that exploits this structure to remove latches. The reachable state set (RSS) is much easier to compute for the new, smaller circuit and can be used to efficiently compute the RSS of the original. Thus we provide a method for obtaining the RSS, and two different initial implementations from which to begin logic optimization. 1 Introduction The computation of the reachable state set (RSS) of a sequential circuit is important for verification, logic optimization and test generation. The RSS computation is typically done incrementally, by looping over a computation of the next states as the image of the current states by a vector of Boolean functions [3]. When BDD-based algorithms are used, the variables are the latches and the circuit inputs. Therefore, the number of latch..

    Verifying Synchronous Reactive Systems Programmed in ESTEREL

    No full text
    Synchronous reactive systems (SRS) form a convenient description model of controllers, either software or hardware. They are based on finite state Mealy machine interpretation, and languages such as Esterel or Lustre allow for their structured programming, including parallel subcomponents. Internal for mats such as Blif provide intermediate description at the level of boolean equa tions, in which both output signal and resulting latch values are expressed as propositional formulae on input signals and previous latch values. Esterel also produces its own such format, named sc format. SRSs allow powerful programming of reactive systems, using synchronous parallelism as the core modularity feature, and signal diffusion for instantaneous communication. The impact for this programming style was amply demonstrated elsewhere. As a trade-off to this expressiveness,..

    Mechanical Properties of Polymer-Based Blanks for Machined Dental Restorations

    No full text
    The tremendous technological and dental material progress led to a progressive advancement of treatment technologies and materials in restorative dentistry and prosthodontics. In this approach, CAD/CAM restorations have proven to be valuable restorative dental materials in both provisional and definitive restoration, owing to multifarious design, improved and highly tunable mechanical, physical and morphological properties. Thus far, the dentistry market offers a wide range of CAD/CAM restorative dental materials with highly sophisticated design and proper characteristics for a particular clinical problem or multiple dentistry purposes. The main goal of this research study was to comparatively investigate the micro-mechanical properties of various CAD/CAM restorations, which are presented on the market and used in clinical dentistry. Among the investigated dental specimens, hybrid ceramic-based CAD/CAM presented the highest micro-mechanical properties, followed by CAD/CAM PMMA-graphene, while the lowest micro-mechanical features were registered for CAD/CAM multilayered PMMA
    corecore